Cal Poly Masters Seminar
Cal Poly - San Luis Obispo
University of Nebraska - Lincoln
University of Nebraska - Lincoln
2024-07-29
Introduction to Graphical Testing
Statistical Lineups
‘You Draw It’ Method
Recap: Eye Fitting Straight Lines in the Modern Era
Subsequent + Future Work
Discussion
Data visualization is defined as the art of drawing graphical charts in order to display data Unwin (2020).
Timeline of Infographics by RJ Andrews (Info We Trust Data Storyteller).
Evaluate design choices and understand cognitive biases through the use of visual tests.
Could ask participants to:
identify differences in graphs.
read information off of a chart accurately.
use data to make correct real-world decisions.
predict the next few observations.
Carpenter and Shah (1988) identifies pattern recognition, interpretative processes, and integrative processes as strategies and processes required to complete tasks of varying degrees of complexity.
Pattern recognition requires the viewer to encode graphic patterns.
Interpretive processes operate on those patterns to construct meaning.
Integrative processes then relate the meanings to the contextual scenario as inferred from labels and titles.
When doing exploratory data analysis, how do we know if what we see is actually there?
Embed a target plot (actual data) in a lineup of null plots (randomly permuted data sets).
The principle of simple linear regression is to find the line (i.e., determine its equation) which passes as close as possible to the observations, that is, the set of points.
OR VISIT bit.ly/3BF56Zj
The principle of simple linear regression is to find the line (i.e., determine its equation) which passes as close as possible to the observations, that is, the set of points.
Big Idea: How do statistical regression results compare to intuitive, visually fitted results?
Mosteller et al. (1981)
Readers are asked to input their own assumptions about various metrics and compare how these assumptions relate to reality.
Study Participant Prompt: Use your mouse to fill in the trend in the yellow box region.
Robinson, Howard, and VanderPlas (2023) provides details on the development and implementation of the ‘You Draw It’ method in R.
youdrawitR package (Dillon Murphy, Google Summer of Code 2023).
D3.js is to JavaScript as ggplot2 is to R (kind of…)
Codecademy: Introduction to JavaScript
Understand SVG elements: inspect elements in web browser!
Amelia Wattenberger’s Full Stack D3 and Data Visualization Book
Build a basic graphic using r2d3
Modify D3.js code until it does what you want!
Additional Resources
How to learn D3 with no coding experience
Kiegan’s ISU Graphics Group Presentation from Mar 25, 2021
Amelia Wattenberger on Twitter (or “X” now?)
Validate ‘You Draw It’ as a method for graphical testing, comparing results to the less technological method utilized in Mosteller et al. (1981).
Extend the study with formal statistical analysis methods in order to better understand the perception of linear regression.
\(N = 30\) points \((x_i, y_i), i = 1,...N\) were generated for \(x_i \in [x_{min}, x_{max}]\).
Data were simulated based on linear model with additive errors: \[\begin{equation} y_i = \beta_0 + \beta_1 x_i + e_i \end{equation}\]
where \(e_i \sim N(0, \sigma^2).\)
Parameters \(\beta_0\) and \(\beta_1\) were selected to reflect the four data sets used in Mosteller et al. (1981).
Participants recruited through Twitter, Reddit, and direct email in May 2021.
A total of 35 individuals completed 119 unique you draw it task plots.
Data sets were generated randomly, independently for each participant at the start of the experiment.
Participants shown 2 practice plots followed by 4 task plots randomly assigned for each individual in a completely randomized design.
Experiment conducted and distributed through an RShiny application found here.
For each participant, the final data set used for analysis contains: + \(x_{ijk}\), \(y_{ijk,drawn}\), \(\hat y_{ijk,OLS}\), \(\hat y_{ijk,PCA}\)
for + parameter choice \(i = 1,2,3,4\), + participant j = \(1,...N_{participant}\) + \(x_{ijk}\) value corresponding to increment \(k = 1, ...,4 x_{max} + 1\).
Vertical residuals between the drawn and fitted values were calculated as: + \(e_{ijk,OLS} = y_{ijk,drawn} - \hat y_{ijk,OLS}\) + \(e_{ijk,PCA} = y_{ijk,drawn} - \hat y_{ijk,PCA}\).
The Linear Mixed Model equation for each fit (OLS and PCA) residuals is given by: \[\begin{equation} e_{ijk,fit} = \left[\gamma_0 + \alpha_i\right] + \left[\gamma_{1} x_{ijk} + \gamma_{2i} x_{ijk}\right] + p_{j} + \epsilon_{ijk} \end{equation}\]
The Generalized Additive Mixed Model equation for each fit (OLS and PCA) residuals is given by: \[\begin{equation} e_{ijk,fit} = \alpha_i + s_{i}(x_{ijk}) + p_{j} + s_{j}(x_{ijk}) \end{equation}\] where
Research Objectives:
Validate ‘You Draw It’ as a method for graphical testing, comparing results to the less technological method utilized in Mosteller et al. (1981).
Extend the study found in Mosteller et al. (1981) with formal statistical analysis methods for understanding the perception of linear regression.
Results:
Estimated drawn trend-lines followed closer to the regression line based on the principal axes than the OLS regression line.
Most prominent in data simulated with large variances.
Humans perform “ensemble perception” in a statistical graphic setting.
The reproducibility of these results serve as validation of the ‘You Draw It’ tool and method.
Collected large crowd sourced sample via Prolific
Implemented ‘You Draw It’ method to measure predictions for exponential growth on log and linear scales
youdrawitR package – Goal to publish on CRAN
One-to-many function relationships
Use of ‘You Draw It’ in the classroom as an educational tool